Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model

نویسندگان

  • Jeff Z. Ma
  • Li Deng
چکیده

In this paper, we present two efficient strategies for likelihood computation and decoding in a continuous speech recognizer using an underlying nonlinear state-space dynamic model for the hidden speech dynamics. The state-space model has been specially constructed so as to be suitable for the conversational or casual style of speech where phonetic reduction abounds. Two specific decoding algorithms, based on optimal state-sequence estimation for the nonlinear state-space model, are derived, implemented, and evaluated. They successfully overcome the exponential growth in the original search paths by using the path-merging approaches derived from Bayes’ rule. We have tested and compared the two algorithms using the speech data from the Switchboard corpus, confirming their effectiveness. Conversational speech recognition experiments using the Switchboard corpus further demonstrated that the use of the new decoding strategies is capable of reducing the recognizer’s word error rate compared with two baseline recognizers, including the HMM system and the nonlinear state-space model using the HMM-produced phonetic boundaries, under identical test conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient decoding strategy for conversational speech recognition using state-space models for vocal-tract-resonance dynamics

In this paper, we present an efficient strategy for likelihood computation and decoding in a continuous speech recognizer using underlying state-space dynamic models for the hidden speech dynamics. The state-space models have been constructed in a special way so as to be suitable for the conversational or casual style of speech where phonetic reduction abounds. The interacting multiple model (I...

متن کامل

Using Continuous Space Language Models for Conversational Speech Recognition

Language modeling for conversational speech suffers from the limited amount of available adequate training data. This paper describes a new approach that performs the estimation of the language model probabilities in a continuous space, allowing by these means smooth interpolation of unobserved n-grams. This continuous space language model is used during the last decoding pass of a state-of-the...

متن کامل

Conversational telephone speech recognition

This paper describes the development of a speech recognition system for the processing of telephone conversations, starting with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are required to achieve state-of-theart performance on conversational speech. Some major changes on the aco...

متن کامل

New pruning criteria for efficient decoding

In large vocabulary continuous speech recognizers the search space needs to be constrained efficiently to make the recognition task feasible. Beam pruning and restricting the number of active paths are the most widely applied techniques for this. In this paper, we present three additional pruning criteria, which can be used to further limit the search space. These new criteria take into account...

متن کامل

Attention shift decoding for conversational speech recognition

We introduce a novel approach to decoding in speech recognition (termed attention-shift decoding) that attempts to mimic aspects of human speech recognition responsible for robustness in processing conversational speech. Our approach is a radical departure from traditional decoding algorithms for speech recognition. We propose a method to first identify reliable regions of the speech signal and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Speech and Audio Processing

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2003